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I We propose that hypergraphs can be used to model social networks with overlapping communi- 

fT^ ' ties. The nodes of the hypergraphs represent the communities. The hyperlinks of the hypergraphs 

denote the individuals who may participate in multiple communities. The hypergraphs are not 
easy to analyze, however, the line graphs of hypergraphs are simple graphs or weighted graphs, so 
, that the network theory can be applied. We define the overlapping depth k of an individual by 

(713 I the number of communities that overlap in that individual, and we prove that the minimum adja- 

cency eigenvalue of the corresponding line graph is not smaller than — A;niax, which is the maximum 
overlapping depth of the whole network. Based on hypergraphs with preferential attachment, we 
^ . establish a network model which incorporates overlapping communities with tunable overlapping 

, parameters k and w. By comparing with the Hyves social network, we show that our social net- 

work model possesses high clustering, assortative mixing, power-law degree distribution and short 
fSJ ' average path length. 



O ! 1 Introduction 

Social networks, as one type of real- world complex networks, are currently widely studied [H El [3l H] . 
^ ' Most social networks have common properties of the real- world networks, such as high clustering 
I coefficient, short characteristic path length, power law degree distribution [U O [Sj [6] . Meanwhile, they 
possess some special properties like assortative mixture, community and hierarchical structure [H El [H 
[9] . The communities are the subunits of a network, which exhibit relatively higher levels of connections 
within the subunits and a lower connectivity between the subunits. Community structures feature 
important topological properties that have catalyzed researches on communities detection algorithms 
and on modularity analysis |101 \TT\ [T2] . The communities overlap with each other when nodes belong 
to multiple communities. The overlap of different communities exists naturally in real- world complex 
networks, particularly in social and biological networks |131 [T^ I15| . The overlap is present at the 
interface between communities and could also be pervasive in the whole network. The existence 
of overlapping communities challenge the traditional algorithms and methods |10j for community 
detection and network (nodes) partitioning. Ahn et al. [7j and Evans et al. [H] proposed that 
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partitioning the links of the concerned network could be void of overlapping communities. Actually 
this method only works when two communities overlap at most in one node, as shown in Figure [T] (a) . 
If two communities overlap in two or more nodes, they also overlap in links, as shown in Figured] (b) 
where the thick black links belong to two communities. 

We propose that hypergraphs and line graphs of hypergraphs can be used to model the networks 
with overlapping communities. A hypergraph is the generalization of a simple graph. A hypergraph 
H{N,L) has the same types of nodes as a simple graph jl7] , but its hyperlink^ can connect a 
variable number k of nodes, k = 1, 2, 3, • • • . Here and L denote the number of nodes and hyperlinks 
respectively. The line graph I {H) of a hypergraph H (N, L) is a graph in which every node of I (H) 
represents a hyperlink of H (N, L) and two nodes of I (H) are adjacent if and only if their corresponding 
links share node(s) in H {N, L) [18] . As discussed in Section [3l the line graph / (H) is a simple grap _ 
when H {N, L) is lineajfl, otherwise / (H) is a weighted graph. Applying the concepts to communities, 
we have that: 

• Hypergraphs: The nodes represent the communities; The hyperlinks denote the individuals 
who may belong to multiple communities. If an individual belongs to several communities, the 
corresponding nodes are connected by the corresponding hyperlink. 

• Line graphs of hypergraphs: The nodes represent the individuals. The communities consist of 
the participating nodes and all the links inter-connecting them. Two individuals are connected 
by a link if they belong to the same community. All the communities are the cliques in the line 
graph. 

By using hypergraphs and their line graphs, we establish in this article a network model which 
incorporates overlapping communities structures for the first time with tunable overlapping parame- 
ters: the overlapping depth k and the overlapping width w (defined in Section [2.ip . By introducing 
the preferential attachment to hypergraphs, we obtain a power-law community size distribution and a 
power-law degree distribution. Our network model also possesses high clustering, assortative mixing 
and short average path length. We compare the mentioned metrics of our model with the correspond- 
ing metrics of an online social network retrieved from a part of public profiles of Hyves, which is the 
popular Dutch social networking site. 

2 Hypergraphs modeling social networks with overlapping commu- 
nities 

2.1 The overlapping parameters for communities 

Human beings have multiple roles in the society, and these roles make people members of multiple 
communities at the same time, such as companies, universities, families /relationships, hobby clubs. 



^The hyperlinks here should not be confused with hyperlinks of WWW webs. Some papers call them hyperedges. 
simple graph is an unweighted, undirected graph containing no self-loops (links starting and ending at the same 



node) nor multiple links between the same pair of nodes 
■^A hypergraph is linear if each pair of hyperlinks shai 
the same number k of nodes are defined as fc-uniform hypergraphs. A 2-uniform hypergraph is a simple graph 



■^A hypergraph is linear if each pair of hyperlinks share at most one node. Hypergraphs where all hyperlinks connect 



2 



(a) 



(b) 



Figure 1: The example graphs showing the overlapping depths of nodes and the overlapping widths 
of two communities. The nodes denote the individuals. The communities consist of links of the same 
color and the shared thick black link(s) and the nodes incident to the links. 

etc. Proteins may also involve in multiple functional categories in biological networks. That is how 
overlapping communities emerge in social and biological networks. Sometimes only two communities 
overlap in the same node, and sometimes a huge number of communities overlap in the same node. 
Two communities may overlap only in one node and they may also overlap in many nodes. 

Definition 1 We define the overlapping depth k of an individual by the number of communities that 
overlap in that individual. 

Definition 2 We define the overlapping width w of two communities by the number of individuals 
that they overlap. 

The nodes in Figure [1] denote the individuals. There are five individuals in Figure [1] (a) which 
have at least two communities overlapping in them. The overlapping depths of them are 5, 3, 2, 2, 2. 
As shown in Figure [T] (b), the overlapping widths of four community pairs, red and brown, red and 
dark blue, green and dark blue, brown and green, are 3, 2, 2, 1. The individuals of the social network 
modeled by a A;-uniform hypergraph all belong to k different communities, hence, the overlapping 
depths of all hyperlinks of a fc-uniform hypergraph are k. The overlapping width of any node pair of a 
linear hypergraph is not larger than 1, regarding nodes as communities and hyperlinks as individuals, 

2.2 Modeling social networks 

The hyperlinks and nodes represent the individuals and the communities respectively. People may 
participate in multiple communities. If an individual belongs to several communities, the correspond- 
ing nodes are connected by the corresponding hyperlink. We show how a hypergraph models a real 
social network by an example of Figure [21 This is a small social network of a research group NA^ 
at TU Delft. Despite of its small size, the overlapping communities still emerge. In Figure [21 there 
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Figure 2: An example of a hypergraph modeling a small size real social network. The hyperlinks and 
nodes represent the individuals and the communities respectively. Each individual may participate in 
multiple communities, in other words, the communites overlap with each other. 

are 12 communities as described in Table [H and there are 54 individuals among whom 6 individuals 
belong to NAS group possessing overlapping depth of 5, 3, 3, 2, 2, 2. The 7th individual joins in both 
the communities of a rock band and a soccer team. 

The hypergraphs are too complicated to implement network analysis, however, the line graphs of 
hypergraphs are simple graphs or weighted graphs whose properties are easier to investigate. 

3 The line graphs of hypergraphs 

We store a hypergraph by its unsigned incidence matrix, which is defined as an x L matrix R with 
the entries r^^j = rj^i = • • • = rj^i = 1 and the other entries of the ith. column being 0, when the 
hypergraph i is incident to nodes ji,j2, • • • ,jk- 

Definition 3 The line graph of a linear hypergraph H (N, L) is defined as a graph I {H), of which the 
node set is the set of the hyperlinks of the hypergraph and two nodes are connected by an unweighted 
link when the corresponding hyperlinks share one node. 

Definition 4 The line graph of a nonlinear hypergraph H {N, L) is defined as a graph I {H), of which 
the node set is the set of the hyperlinks of the hypergraph and two nodes are connected by an link of 
weight t when the corresponding hyperlinks share t node(s). 

We observe that the line graph I (H) is a simple graph when H {N, L) is linear, and / {H) of 
nonlinear hypergraph H {N, L) is a weighted graph. The adjacency matrix of the line graphs of 
hypergraphs can be computed from the unsigned incidence matrices of hypergraphs. 

In Figure [3] we show the line graph of the hypergraph of Figure [2j As depicted, there are 12 
communities, of which 5 communities have 6 members and 7 communities have 5 members. Table [2] 
shows the members of all the communities of the network in Figure [3l We see that the line graph 
display the community structure and the overlap better. 
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Nodes 


Communities 


ni 


TU Delft research group-NAS 


n2 


MIT research group 


ns 


Cornell Univ. research group 


n4 


IEEE/ACM ToN editorial board 




Kansas State Univ. research group 




Ericsson 


ny 


KPN (Dutch Telecom) 


77,8 


Piano club 


ng 


TNO (A Dutch consulting company) 


nio 


A rock band 


nil 


A soccer Icnm 


ni2 


TU Delft research group-Bioinformatics 



Table 1: The details of all communities of the NAS social network. 



Communities 


Individuals 


ni 


li to Zg 


n2 


ll, Is to li2 


ns 


hjn to In 


n4 


h,hs to I22 


n5 


h,l23 to l27 


ne 


h, hs to /31 


nj 


^3) ^32 to ^35 


ng 


^3) ^36 to ^39 


ng 


U, ho to Z43 


nio 


I4, It, Z44 to I4Q 


nil 


h,h, to Z50 


ni2 


^6,^51 to Z54 



Table 2: The members of all the communities of the NAS social network. 



5 



Figure 3: The line graph of the hypergraph in Figured The nodes here denote the individuals while 
the communities consist of links of the same color and the shared thick black link(s) and the nodes 
incident to the links. 

4 The relation between the maximum overlapping depth /cmax and 
the smallest adjacency eigenvalue of the corresponding line graph 

4.1 The line graph of hnear and fc-uniform hypergraph Hk {N,L) 

Since Hk (N, L) is /c-uniform, the unsigned incidence matrix R of Hk {N, L) has exactly k 1-entries and 
N — k 0-entries in each column, and we have femax = k. Hence, all the diagonal entries of R^R are k. 
Due to the definition of linearity of hypergraphs, two columns of R of H^ {N, L) have at most one 1- 
entry at the same row. Hence, all the non-diagonal entries of R^R are either 1 or 0. In addition, R^R 
is a Gram matrix \19\ 121)] . Therefore the adjacency matrix of the line graph of linear and A;-uniform 
hypergraph H^ (iV, L) is, 

\h,) = R^R - kl (1) 
Both of the matrices and are positive semidefinite, 

x'^ {R^R) X = [Rxf Rx = \\Rx\\l > 
{RR^) X = [R^x]^ R^x = \\R^xf^ > 

All eigenvalues of [R'^ R) j^^j^ are non-negative. Due to ([1]), the adjacency eigenvalues of the line graph 
of linear and /c-uniform hypergraph Hk {N,L) are not smaller than —k, where k is the overlapping 
depth. 

We have more results for linear and uniform networks. 

Lemma 5 (see [19] ) For all matrices Anxm o.nd Bmxn with n > m, it holds that A {AB) = A (BA) 
and A (AB) has n — m extra zero eigenvalues, 

A"-™ det (BA - XI) = det {AB - XI) 



6 



Using Lemma Owe have, 

det {{R^R),,^ - A/) = A^-^det {{RR^)^,^ - A/) 
Using the definition of the adjacency matrix of the fine graph in ([T]) yields, 

det {Ai^H,) -{X-k)l)= A^~^ det [{RR^)j,,^ - \l] 
or 

det - A/) = (A + kf-"" det {{RR^)^^^ -{\ + k) /) (2) 

The adjacency matrix Ai(Hk) has at least L — N eigenvalues of —k, where N is the number of com- 
munities and L is the number of individuals. The matrix RR^ is positive semidefinite, hence, the 
remaining eigenvalues of Ai^h^) ^ot smaller than —k. 

4.2 The line graph of hnear and non-uniform hypergraph H {N, L) with fcmax 

Since the maximum overlapping depth of H{N,L) is /cmax; the unsigned incidence matrix R of 
Hk {N, L) has at most /cmax 1-entries in each column. Therefore, the largest diagonal entry of R^ R is 
fcmax- The adjacency matrix of the line graph of a linear and non-uniform hypergraph H {N, L) is, 

Ai(H)=R^R + C-k^,^J (3) 



where C = diag y cn C22 • • • cll j and Cjj > 0, 1 < j < L. By adding C to R R, we make all 
the diagonal entries of R^R + C equal to fcmax- 

We show that R^R + C is also positive semidefinite. 



X 



|i?3;||2 + 



Cx 



2 



>0 (4) 



where 2:^x1 is an arbitrary vector and \fC = diag y y^cn y^C22 • • • \/cll j ■ Hence, the adjacency 
eigenvalues of the line graph of a linear and non-uniform hypergraph H (N, L) are not smaller than 
— fcmax, where fcmax is the maximum overlapping depth of H {N, L). 

4.3 The line graph of nonlinear and non-uniform hypergraph H {N, L) with /cmax 

Since H (N, L) is nonlinear, there are some pairs of hyperlinks sharing more than one nodes. If 
hyperlink i and hyperlink j share t nodes, then, by the definition of the line graph of hypergraph 
H{N,L), the link weight of the corresponding link between node i and j in the line graph is t. 
The line graph of nonlinear hypergraph H (N, L) becomes a weighted graph. In the language of 
social networks, the link weight of two individuals is t if the two individuals are both members of 
t communities. The adjacency matrix of the line graph of nonlinear and non-uniform hypergraph 
H{N,L) is, 

A{H) = R^R + c- k^,^j 

where C = diag ^ cn C22 • • • cll ^ and Cjj > 0, 1 < j < L. By adding C to R^R, we make all 
the diagonal entries of R^R + C are femax- We have proved that R^R + C is positive semidefinite, 
hence, the adjacency eigenvalues of the line graph of nonlinear and non- uniform hypergraph H {N, L) 
are also not smaller than — /cmax- 
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Figure 4: The seed (the starting hypergraph) (a) and the growing element (b) we use in the simulation. 
At each time step a growing element is added to the existing hypergraph. All the hyperlinks of the 
growing elements has only one red circle means that they can only connect to one more node. 



Network 


N 


L 


a 


C 




PD 


I 


The Hyves social network 


10326 


1260458 


-0.88 


0.84 


0.024 


0.29 


6.7 


The line graph of the 
generated hypergraph 


1510 


32031 


-0.76 


0.58 


0.029 


0.72 


4.8 



Table 3: The properties of an social network retrieved from Hyves and the line graph of the hypergraph 
generated by our hypergraph model. The properties measured are: the total number of nodes A^, the 
total number of nodes L, exponent a of the power-law degree distribution, clustering coefficient C, 
assortativity coefficient po (we have employed the formula in [9]), average path length /. For a 
comparison we have included the clustering coefficient Cr of a ER random graph with the same size 
and link density. 



5 Hypergraphs with power-law degree distribution 

As a common property, the node degree of many real-world large networks including social networks 
follows a power-law distribution [H [5]. To model social networks better, we need to incorporate 
the power-law degree distribution into our hypergraph model. We introduce network growing and 
preferential attachment to our hypergraph model. 

By preferential attachment, we generate linear and non-uniform hypergraphs only with overlapping 
depth of 2 and 3. Starting with a small hypergraph (with mo nodes, mo > 4), which we call as a 
seed, at every time step we add a growing element which consists of three nodes and two hyperlinks 
of overlapping depth of 2 and two hyperlinks of overlapping depth of 3. The four hyperlinks connect 
all the three nodes of a growing element to the existing hypergraph. Note that all four hyperlinks can 
only connect to one more node. The probability 11 that a hyperlink will connect to a node i depends 
on the current degree Si of i, H (i) = Si/ ^ Si, where Yl '^i is the sum of degrees of all the existing 
nodes. In order to guarantee the linearity, the four hyperlinks must connect to different existing nodes 
at each time step. Figure [H shows us the seed and the growing element we use in the simulation, 

Using this model (with the seed and the growing element in Figure , we generate a hypergraph 
H with 1015 nodes and 1510 hyperlinks, which is stored in the unsigned incidence matrix R. By the 
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formula ([3]), we compute the adjacency matrix of the hne graph / (H). The hne grapqf|of the generated 
hypergraph has 1510 nodes and 32031 links. The degree Dh of a random node of a hypergraph is 
defined as the number of hyperlinks which are incident to that node, and it is essentially equal to the 
size of the corresponding community. The degree distribution Pr (Dh = k) of that generated hyper- 
graph denotes actually the community size distribution, and strictly follows power-law distribution. 
The degree of a random node of the line graph is denoted as -D;(/f), and we show in Figure [5] that the 
degree distribution Pr (^Dk^jj-^ = /c) of the line graph approximately follows power-law distribution. 

As the most popular online social networking site in Netherlands, Hyves has more than 10 million 
users, which means that more than half of the Dutch population are using Hyves. Nearly half of Hyves 
users make their profiles open to the public. From the open profiles we can see some information of 
users including companies, schools, colleges, clubs and other organizations, to which they belong. By 
using a breath-first search we found out that there are 17619 users claiming that they belong to some 
communities. The total number of these communities are 10326. We make a network with 17619 
users as nodes, and two users are connected by a link when they belong to the same community. We 
denote the size of a community as Sc, which is defined as the total number of individuals belonging 
to that community. We compute the properties of the Hyves social network and the line graph of 
the hypergraph generated by our hypergraph model with preferential attachment. As shown in Table 
[3l both of these two networks have high clustering coefficient, positive assortativity coefficient, short 
average path length and similar exponent of the power-law degree distribution, although the size of the 
line graph is much smaller than the size of the Hyves social network. As depicted in Figure [5] (a) and 
(b), the community size of the Hyves social network follows a power-law distribution with exponent 
a = —1.88, and the degree distribution of the Hyves social network can also be fitted by a power-law 
function with exponent a = —0.88. Figure[5](c) and (d) show us that the power-law degree distribution 
of the generated hypergraph a = —2.5 is quite similar with that of the community size distribution 
of Figure [5] (a) , and the exponent of power-law degree distribution of the line graph a = —0.76 seems 
very close to the exponent in Figure [5] (b). Tableland Figure [5] show that our hypergraph model 
with preferential attachment has the common properties of real-world social networks, besides that 
community structure and community overlap are already incorporated. 

6 Conclusion 

We have modeled social networks with overlapping communities by hypergraphs and the line graphs 
of hypergraphs. The hyperlinks and nodes represent the individuals and the communities respectively. 
If an individual belongs to several communities, the corresponding nodes are connected by the corre- 
sponding hyperlink. Since the line graphs of hypergraphs are just simple graphs or weighted graph, 
we can implement the current network analysis techniques. We defined the overlapping depth k of 
an individual by the number of communities that overlap in that individual, and we proved that the 
minimum adjacency eigenvalue of the line graphs of hypergraphs is not smaller than — /cmaxi which is 
the maximum overlapping depth of the whole network. We established a network model which incor- 
porates overlapping communities structures for the first time with tunable overlapping parameters. 

■'This line graph is unweighted, since the hypergraph we have generated is hnear. 
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Figure 5: The community size distribution of the Hyves social network (a) and the degree distribution 
of the generated hypergraph (c), and the degree distribution of the Hyves social network (b) and the 
degree distribution of the line graph of the generated hypergraph (d). They are all fitted by power-law 
function /(x) = /3a;", and a = -1.88 (a), a = -2.5 (c), a = -0.88 (b), a = -0.76 (d). 



By comparing our model with the online social network Hyves, we have shown that our network model 
possesses the common properties of large social networks. 
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